A Linguistic Analysis of Student-Generated Paraphrases

نویسندگان

  • Vasile Rus
  • Shi Feng
  • Russell D. Brandon
  • Scott A. Crossley
  • Danielle S. McNamara
چکیده

Paraphrase identification is a core Natural Language Processing task that involves assessing the semantic similarity of two texts. To foster systematic studies of this task, standardized datasets were created on which various approaches could be compared more fairly. However, a better understanding and more precise operational definition of a paraphrase are needed before any further datasets or systematic evaluations of the task of paraphrase identification are proposed. This study develops the concept of paraphrasing as a writing strategy. Six types of paraphrases are defined through the creation of a relatively large corpus of student-generated paraphrases. These paraphrases are analyzed along several dozen linguistic dimensions ranging from cohesion to lexical diversity. The most significant indices from these dimensions were then used to build a prediction model that could identify true and false paraphrases and each of the six paraphrase types.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Steganography Using Automatically Generated Paraphrases

This paper describes a method for checking the acceptability of paraphrases in context. We use the Google n-gram data and a CCG parser to certify the paraphrasing grammaticality and fluency. We collect a corpus of human judgements to evaluate our system. The ultimate goal of our work is to integrate text paraphrasing into a Linguistic Steganography system, by using paraphrases to hide informati...

متن کامل

A Linguistic Analysis of Expert-Generated Paraphrases

The authors used the computational tool Coh-Metrix to examine expert writers’ paraphrases and in particular, how experts paraphrase text passages using condensing strategies. The overarching goal of this study was to develop machine learning algorithms to aid in the automatic detection of paraphrases and paraphrase types. To this end, three experts were instructed to paraphrase by condensing a ...

متن کامل

Automatic Generation of Syntactically Well-formed and Semantically Appropriate Paraphrases

Paraphrases of an expression are alternative linguistic expressions conveying the same information as the original. Technology for handling paraphrases has been attracting increasing attention due to its potential in a wide range of natural language processing applications; e.g., machine translation, information retrieval, question answering, summarization, authoring and revision support, and r...

متن کامل

Adding paraphrases of the same quality to the C-STAR BTEC

We present a method to expand a linguistic resource with paraphrases, which combines two techniques whose drawbacks neutralise reciprocally. The first step over-generates sentences by using analogy, while the second step overeliminates erroneous sentences which do not meet a criterion on N -gram occurrences. In a practical experiment, we added 17,862 paraphrases to a linguistic resource of 97,7...

متن کامل

Learning Paraphrases to Improve a Question-Answering System

In this paper, we present a nearly unsupervised learning methodology for automatically extracting paraphrases from the Web. Starting with one single linguistic expression of a semantic relationship, our learning algorithm repeatedly samples the Web, in order to build a corpus of potential new examples of the same relationship. Sampling steps alternate with validation steps, during which implaus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011